Weighting Phone Confidence Measures for Automatic Speech Recognition
نویسندگان
چکیده
One of the most useful applications of Confidence Measures (CMs) in Automatic Speech Recognition systems is early detection of incorrect recognition hypotheses. A purely acoustic basis for such a CM is particularly important when tracking errors resulting from Out of Vocabulary speech, background noise or keyword substitution. A commonly taken approach is to compute scores on subword units of the hypothesized words and combine them in a word score. This paper investigates the assumption that some subword types contain stronger distinctive properties than others. Therefore, their scores ought to have a higher contribution in the eventual word scores. Experiments in a connected digit recognition task showed a relative Confidence Error Rate improvement of 6% on word level and 11% on sentence level in comparison to the baseline CM, with equal contribution of the phone confidence scores.
منابع مشابه
Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices
Confidence Measures (CMs) estimated from Large Vocabulary Continuous Speech Recognition (LVCSR) outputs are commonly used metrics to detect incorrectly recognized words. In this paper, we propose to exploit CMs derived from frame-based word and phone posteriors to detect speech segments containing pronunciations from non-target (alien) languages. The LVCSR system used is built for English, whic...
متن کاملVAD-measure-embedded decoder with online model adaptation
We previously proposed a decoding method for automatic speech recognition utilizing hypothesis scores weighted by voice activity detection (VAD)-measures. This method uses two Gaussian mixture models (GMMs) to obtain confidence measures: one for speech, the other for non-speech. To achieve good search performance, we need to adapt the GMMs properly for input utterances and environmental noise. ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملAutomatic Conversion of Dialectal Tamil Text to Standard Written Tamil Text Using Fsts Rules, Analogy, and Social Factors Codetermine Past-tense Formation Patterns in English Revisiting Word Neighborhoods for Speech Recognition
Word neighborhoods have been suggested but not thoroughly explored as an explanatory variable for errors in automatic speech recognition (ASR). We revisit the definition of word neighborhoods, propose new measures using a fine-grained articulatory representation of word pronunciations, and consider new neighbor weighting functions. We analyze the significance of our measures as predictors of er...
متن کاملConfidence measures for hybrid HMM/ANN speech recognition
In this paper we introduce four acoustic confidence measures which are derived from the output of a hybrid HMM/ANN large vocabulary continuous speech recognition system. These confidence measures, based on local posterior probability estimates computed by an ANN, are evaluated at both phone and word levels, using the North American Business News corpus.
متن کامل